Overview

Dataset statistics

Number of variables11
Number of observations987269
Missing cells27277
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory82.9 MiB
Average record size in memory88.0 B

Variable types

NUM8
BOOL2
CAT1

Warnings

row_id is highly correlated with Unnamed: 0 and 1 other fieldsHigh correlation
Unnamed: 0 is highly correlated with row_id and 1 other fieldsHigh correlation
user_id is highly correlated with Unnamed: 0 and 1 other fieldsHigh correlation
prior_question_elapsed_time has 23356 (2.4%) missing values Missing
Unnamed: 0 has unique values Unique
row_id has unique values Unique
user_answer has 274421 (27.8%) zeros Zeros

Reproduction

Analysis started2020-12-08 09:47:33.673751
Analysis finished2020-12-08 09:49:18.367265
Duration1 minute and 44.69 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct987269
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51061759.65
Minimum78225
Maximum101224061
Zeros0
Zeros (%)0.0%
Memory size7.5 MiB
2020-12-08T10:49:20.031608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum78225
5-th percentile6486992.4
Q126890846
median50998218
Q375022565
95-th percentile97041437.6
Maximum101224061
Range101145836
Interquartile range (IQR)48131719

Descriptive statistics

Standard deviation28483682.82
Coefficient of variation (CV)0.5578280697
Kurtosis-1.155367124
Mean51061759.65
Median Absolute Deviation (MAD)24107039
Skewness0.01417549326
Sum5.041169239e+13
Variance8.113201871e+14
MonotocityNot monotonic
2020-12-08T10:49:20.393068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
187870891< 0.1%
 
780148651< 0.1%
 
965041411< 0.1%
 
419761401< 0.1%
 
377941221< 0.1%
 
923159761< 0.1%
 
95417521< 0.1%
 
264838071< 0.1%
 
27089731< 0.1%
 
336120961< 0.1%
 
Other values (987259)987259> 99.9%
 
ValueCountFrequency (%) 
782251< 0.1%
 
782261< 0.1%
 
782271< 0.1%
 
782281< 0.1%
 
782291< 0.1%
 
ValueCountFrequency (%) 
1012240611< 0.1%
 
1012240601< 0.1%
 
1012240591< 0.1%
 
1012240581< 0.1%
 
1012240571< 0.1%
 

row_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct987269
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51061759.65
Minimum78225
Maximum101224061
Zeros0
Zeros (%)0.0%
Memory size7.5 MiB
2020-12-08T10:49:21.615444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum78225
5-th percentile6486992.4
Q126890846
median50998218
Q375022565
95-th percentile97041437.6
Maximum101224061
Range101145836
Interquartile range (IQR)48131719

Descriptive statistics

Standard deviation28483682.82
Coefficient of variation (CV)0.5578280697
Kurtosis-1.155367124
Mean51061759.65
Median Absolute Deviation (MAD)24107039
Skewness0.01417549326
Sum5.041169239e+13
Variance8.113201871e+14
MonotocityNot monotonic
2020-12-08T10:49:22.005163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
187870891< 0.1%
 
780148651< 0.1%
 
965041411< 0.1%
 
419761401< 0.1%
 
377941221< 0.1%
 
923159761< 0.1%
 
95417521< 0.1%
 
264838071< 0.1%
 
27089731< 0.1%
 
336120961< 0.1%
 
Other values (987259)987259> 99.9%
 
ValueCountFrequency (%) 
782251< 0.1%
 
782261< 0.1%
 
782271< 0.1%
 
782281< 0.1%
 
782291< 0.1%
 
ValueCountFrequency (%) 
1012240611< 0.1%
 
1012240601< 0.1%
 
1012240591< 0.1%
 
1012240581< 0.1%
 
1012240571< 0.1%
 

timestamp
Real number (ℝ≥0)

Distinct761364
Distinct (%)77.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7555878183
Minimum0
Maximum8.077425916e+10
Zeros3969
Zeros (%)0.4%
Memory size7.5 MiB
2020-12-08T10:49:22.997731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile471140.2
Q1462058218
median2305517139
Q39166741308
95-th percentile3.386438115e+10
Maximum8.077425916e+10
Range8.077425916e+10
Interquartile range (IQR)8704683090

Descriptive statistics

Standard deviation1.217767912e+10
Coefficient of variation (CV)1.611682829
Kurtosis8.48916776
Mean7555878183
Median Absolute Deviation (MAD)2221170700
Skewness2.682689923
Sum7.459684298e+15
Variance1.482958688e+20
MonotocityNot monotonic
2020-12-08T10:49:23.334224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
039690.4%
 
2197549< 0.1%
 
3107058< 0.1%
 
10282948< 0.1%
 
3211068< 0.1%
 
8861088< 0.1%
 
2849077< 0.1%
 
4736077< 0.1%
 
1414537< 0.1%
 
4071717< 0.1%
 
Other values (761354)98323199.6%
 
ValueCountFrequency (%) 
039690.4%
 
21271< 0.1%
 
34801< 0.1%
 
34851< 0.1%
 
39941< 0.1%
 
ValueCountFrequency (%) 
8.077425916e+101< 0.1%
 
8.077422286e+101< 0.1%
 
8.077418077e+101< 0.1%
 
8.077415011e+101< 0.1%
 
8.077412023e+101< 0.1%
 

user_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3935
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1086222355
Minimum1710599
Maximum2147379374
Zeros0
Zeros (%)0.0%
Memory size7.5 MiB
2020-12-08T10:49:23.692127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1710599
5-th percentile141078962
Q1574033464
median1080358589
Q31596427076
95-th percentile2058869515
Maximum2147379374
Range2145668775
Interquartile range (IQR)1022393612

Descriptive statistics

Standard deviation604002707.4
Coefficient of variation (CV)0.5560580713
Kurtosis-1.155781392
Mean1086222355
Median Absolute Deviation (MAD)508556150
Skewness0.01532202043
Sum1.072393658e+15
Variance3.648192706e+17
MonotocityNot monotonic
2020-12-08T10:49:24.001215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1641944791121461.2%
 
181623857482090.8%
 
34605987181700.8%
 
81483624379500.8%
 
121776516777920.8%
 
58615157871010.7%
 
141695951568080.7%
 
143043895864300.7%
 
84787484960320.6%
 
135708630157590.6%
 
Other values (3925)91087292.3%
 
ValueCountFrequency (%) 
171059922< 0.1%
 
190606930< 0.1%
 
2101969253< 0.1%
 
219858128< 0.1%
 
238111062< 0.1%
 
ValueCountFrequency (%) 
214737937441< 0.1%
 
21459205306110.1%
 
214538157830< 0.1%
 
2145343445107< 0.1%
 
214525371410820.1%
 

content_id
Real number (ℝ≥0)

Distinct13310
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5222.613857
Minimum0
Maximum32736
Zeros79
Zeros (%)< 0.1%
Memory size7.5 MiB
2020-12-08T10:49:24.332216image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile320
Q12081
median5030
Q37391
95-th percentile10688
Maximum32736
Range32736
Interquartile range (IQR)5310

Descriptive statistics

Standard deviation3852.192066
Coefficient of variation (CV)0.7375984845
Kurtosis7.314665309
Mean5222.613857
Median Absolute Deviation (MAD)2726
Skewness1.613867732
Sum5156124760
Variance14839383.71
MonotocityNot monotonic
2020-12-08T10:49:24.626764image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
611621970.2%
 
617321050.2%
 
412020500.2%
 
17520190.2%
 
787619380.2%
 
790018510.2%
 
449217870.2%
 
206317720.2%
 
206417720.2%
 
206517720.2%
 
Other values (13300)96800698.0%
 
ValueCountFrequency (%) 
079< 0.1%
 
170< 0.1%
 
24960.1%
 
3217< 0.1%
 
4307< 0.1%
 
ValueCountFrequency (%) 
3273684< 0.1%
 
3262574< 0.1%
 
3257026< 0.1%
 
3253542< 0.1%
 
3249115< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.5 MiB
0
967834 
1
 
19435
ValueCountFrequency (%) 
096783498.0%
 
1194352.0%
 
2020-12-08T10:49:24.854636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

task_container_id
Real number (ℝ≥0)

Distinct9210
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean760.9032736
Minimum0
Maximum9210
Zeros3961
Zeros (%)0.4%
Memory size7.5 MiB
2020-12-08T10:49:25.058687image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9
Q197
median346
Q3908
95-th percentile3058
Maximum9210
Range9210
Interquartile range (IQR)811

Descriptive statistics

Standard deviation1130.052798
Coefficient of variation (CV)1.485146453
Kurtosis11.60782145
Mean760.9032736
Median Absolute Deviation (MAD)299
Skewness3.005098215
Sum751216214
Variance1277019.327
MonotocityNot monotonic
2020-12-08T10:49:25.408827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1481300.8%
 
1580800.8%
 
469600.7%
 
569450.7%
 
669320.7%
 
768530.7%
 
1140440.4%
 
1039880.4%
 
839670.4%
 
939670.4%
 
Other values (9200)92740393.9%
 
ValueCountFrequency (%) 
039610.4%
 
139640.4%
 
239570.4%
 
339440.4%
 
469600.7%
 
ValueCountFrequency (%) 
92101< 0.1%
 
92091< 0.1%
 
92081< 0.1%
 
92071< 0.1%
 
92061< 0.1%
 

user_answer
Real number (ℝ)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.376106208
Minimum-1
Maximum3
Zeros274421
Zeros (%)27.8%
Memory size7.5 MiB
2020-12-08T10:49:25.756580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile0
Q10
median1
Q33
95-th percentile3
Maximum3
Range4
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.192983359
Coefficient of variation (CV)0.8669268057
Kurtosis-1.298870727
Mean1.376106208
Median Absolute Deviation (MAD)1
Skewness0.08394766547
Sum1358587
Variance1.423209296
MonotocityNot monotonic
2020-12-08T10:49:25.999621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
027442127.8%
 
126279126.6%
 
325398725.7%
 
217663517.9%
 
-1194352.0%
 
ValueCountFrequency (%) 
-1194352.0%
 
027442127.8%
 
126279126.6%
 
217663517.9%
 
325398725.7%
 
ValueCountFrequency (%) 
325398725.7%
 
217663517.9%
 
126279126.6%
 
027442127.8%
 
-1194352.0%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.5 MiB
1
635571 
0
332263 
-1
 
19435
ValueCountFrequency (%) 
163557164.4%
 
033226333.7%
 
-1194352.0%
 
2020-12-08T10:49:26.279032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-08T10:49:26.467513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:26.785548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length2
Median length1
Mean length1.019685618
Min length1

prior_question_elapsed_time
Real number (ℝ≥0)

MISSING

Distinct1687
Distinct (%)0.2%
Missing23356
Missing (%)2.4%
Infinite0
Infinite (%)0.0%
Mean25320.71289
Minimum0
Maximum300000
Zeros1699
Zeros (%)0.2%
Memory size7.5 MiB
2020-12-08T10:49:27.116174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6000
Q116000
median21000
Q329250
95-th percentile57000
Maximum300000
Range300000
Interquartile range (IQR)13250

Descriptive statistics

Standard deviation20814.85445
Coefficient of variation (CV)0.8220485157
Kurtosis54.0655943
Mean25320.71289
Median Absolute Deviation (MAD)6000
Skewness5.484981435
Sum2.440696433e+10
Variance433258165.8
MonotocityNot monotonic
2020-12-08T10:49:28.026339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
17000499425.1%
 
16000462234.7%
 
18000444794.5%
 
19000391554.0%
 
15000359123.6%
 
20000349933.5%
 
21000325313.3%
 
22000300923.0%
 
23000263012.7%
 
14000259812.6%
 
Other values (1677)59830460.6%
 
(Missing)233562.4%
 
ValueCountFrequency (%) 
016990.2%
 
2001< 0.1%
 
2505< 0.1%
 
33310730.1%
 
60012< 0.1%
 
ValueCountFrequency (%) 
30000011720.1%
 
2990005< 0.1%
 
2980001< 0.1%
 
2970002< 0.1%
 
2960004< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing3921
Missing (%)0.4%
Memory size7.5 MiB
True
870275 
False
113073 
(Missing)
 
3921
ValueCountFrequency (%) 
True87027588.1%
 
False11307311.5%
 
(Missing)39210.4%
 
2020-12-08T10:49:28.474378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2020-12-08T10:48:27.164911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:28.556912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:29.570025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:30.272573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:30.833253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:31.280085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:32.183906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:32.638740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:33.081029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:33.525659image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:33.977312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:34.432664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:34.893856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:35.360281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:35.837727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:36.300569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:36.745506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:37.204203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:37.666068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:38.304230image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:38.833157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:39.481143image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:39.969136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:40.654103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:41.151335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:41.582740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:42.014250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:42.490002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:42.965068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:43.490394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:44.038596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:44.502438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:44.962371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:45.423451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:45.876149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:46.593515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:47.357765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:47.867702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:48.396330image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:48.908253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:49.374117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:49.820991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:50.295845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:50.810935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:51.304926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:51.761650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:52.322959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:52.966691image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:53.555305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:54.021520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:54.481649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:54.938221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:55.395277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:55.846675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:56.327327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:56.830765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:57.310811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:57.788924image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:58.257271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:58.723195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:59.225071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:48:59.782142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:00.359231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:00.821108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-08T10:49:28.655685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-08T10:49:29.107504image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-08T10:49:29.971427image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-08T10:49:30.875055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-12-08T10:49:02.757790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:06.492239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:15.474249image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-08T10:49:16.068303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

Unnamed: 0row_idtimestampuser_idcontent_idcontent_type_idtask_container_iduser_answeranswered_correctlyprior_question_elapsed_timeprior_question_had_explanation
090286654902866540191742989679000001NaNNaN
190286655902866552651119174298967876012021000.0False
29028665690286656553121917429896175020021000.0False
3902866579028665710243619174298961278030023000.0False
4902866589028665818845419174298962063043045000.0False
5902866599028665918845419174298962065042145000.0False
6902866609028666018845419174298962064042045000.0False
7902866619028666126429619174298963365050026666.0False
8902866629028666226429619174298963364051126666.0False
9902866639028666326429619174298963363052026666.0False

Last rows

Unnamed: 0row_idtimestampuser_idcontent_idcontent_type_idtask_container_iduser_answeranswered_correctlyprior_question_elapsed_timeprior_question_had_explanation
9872594748428847484288648640100384997461730120015000.0False
9872604748428947484289668463100384997463700132129000.0False
98726147484290474842901619704100384997469100143117000.0False
98726247484291474842911619704100384997469080140117000.0False
98726347484292474842921619704100384997469110143117000.0False
98726447484293474842931619704100384997469090140117000.0False
98726547484294474842941906094100384997472170152168500.0False
98726647484295474842951906094100384997472160153168500.0False
98726747484296474842961906094100384997472190152168500.0False
98726847484297474842971906094100384997472180152168500.0False